18 research outputs found

    3D Gaze Estimation from Remote RGB-D Sensors

    Get PDF
    The development of systems able to retrieve and characterise the state of humans is important for many applications and fields of study. In particular, as a display of attention and interest, gaze is a fundamental cue in understanding people activities, behaviors, intentions, state of mind and personality. Moreover, gaze plays a major role in the communication process, like for showing attention to the speaker, indicating who is addressed or averting gaze to keep the floor. Therefore, many applications within the fields of human-human, human-robot and human-computer interaction could benefit from gaze sensing. However, despite significant advances during more than three decades of research, current gaze estimation technologies can not address the conditions often required within these fields, such as remote sensing, unconstrained user movements and minimum user calibration. Furthermore, to reduce cost, it is preferable to rely on consumer sensors, but this usually leads to low resolution and low contrast images that current techniques can hardly cope with. In this thesis we investigate the problem of automatic gaze estimation under head pose variations, low resolution sensing and different levels of user calibration, including the uncalibrated case. We propose to build a non-intrusive gaze estimation system based on remote consumer RGB-D sensors. In this context, we propose algorithmic solutions which overcome many of the limitations of previous systems. We thus address the main aspects of this problem: 3D head pose tracking, 3D gaze estimation, and gaze based application modeling. First, we develop an accurate model-based 3D head pose tracking system which adapts to the participant without requiring explicit actions. Second, to achieve a head pose invariant gaze estimation, we propose a method to correct the eye image appearance variations due to head pose. We then investigate on two different methodologies to infer the 3D gaze direction. The first one builds upon machine learning regression techniques. In this context, we propose strategies to improve their generalization, in particular, to handle different people. The second methodology is a new paradigm we propose and call geometric generative gaze estimation. This novel approach combines the benefits of geometric eye modeling (normally restricted to high resolution images due to the difficulty of feature extraction) with a stochastic segmentation process (adapted to low-resolution) within a Bayesian model allowing the decoupling of user specific geometry and session specific appearance parameters, along with the introduction of priors, which are appropriate for adaptation relying on small amounts of data. The aforementioned gaze estimation methods are validated through extensive experiments in a comprehensive database which we collected and made publicly available. Finally, we study the problem of automatic gaze coding in natural dyadic and group human interactions. The system builds upon the thesis contributions to handle unconstrained head movements and the lack of user calibration. It further exploits the 3D tracking of participants and their gaze to conduct a 3D geometric analysis within a multi-camera setup. Experiments on real and natural interactions demonstrate the system is highly accuracy. Overall, the methods developed in this dissertation are suitable for many applications, involving large diversity in terms of setup configuration, user calibration and mobility

    Person Independent 3D Gaze Estimation From Remote RGB-D Cameras

    Get PDF
    We address the problem of person independent 3D gaze estimation using a remote, low resolution, RGB-D camera. The approach relies on a sparse technique to reconstruct normalized eye test images from a gaze appearance model (a set of eye image/gaze pairs) and infer their gaze accordingly. In this context, the paper makes three contributions: (i) unlike most previous approaches, we exploit the coupling (and constraints) between both eyes to infer their gaze jointly; (ii) we show that a generic gaze appearance model built from the aggregation of person-specific models can be used to handle unseen users and compensate for appearance variations across people, since a test user eyes' appearance will be reconstructed from similar users within the generic model. (iii) we propose an automatic model selection method that leads to comparable performance with a reduced computational load

    Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras

    Get PDF
    We propose a head pose invariant gaze estimation model for distant RGB-D cameras. It relies on a geometric understanding of the 3D gaze action and generation of eye images. By introducing a semantic segmentation of the eye region within a generative process, the model (i) avoids the critical feature tracking of geometrical approaches requiring high resolution images; (ii) decouples the person dependent geometry from the ambient conditions, allowing adaptation to different conditions without retraining. Priors in the generative framework are adequate for training from few samples. In addition, the model is capable of gaze extrapolation allowing for less restrictive training schemes. Comparisons with state of the art methods validate these properties which make our method highly valuable for addressing many diverse tasks in sociology, HRI and HCI

    Gaze Estimation From Multimodal Kinect Data

    Get PDF
    This paper addresses the problem of free gaze estimation under unrestricted head motion. More precisely, unlike previous approaches that mainly focus on estimating gaze towards a small planar screen, we propose a method to estimate the gaze direction in the 3D space. In this context the paper makes the following contributions: (i) leveraging on Kinect device, we propose a multimodal method that rely on depth sensing to obtain robust and accurate head pose tracking even under large head pose, and on the visual data to obtain the remaining eye-in-head gaze directional information from the eye image; (ii) a rectification scheme of the image that exploits the 3D mesh tracking, allowing to conduct a head pose free eye-in-head gaze directional estimation; (iii) a simple way of collecting ground truth data thanks to the Kinect device. Results on three users demonstrate the great potential of our approach

    3D Gaze Tracking and Automatic Gaze Coding from RGB-D Cameras

    Get PDF
    Gaze is recognised as one of the most important cues for the analysis of the cognitive behaviors of a person such as the attention displayed towards objects or people, their interactions, functionality and causality patterns. In this short paper, we present our investigations towards the development of 3D gaze sensing solutions from consumer RGB-D sensors, including their use for the inference of visual attention in natural dyadic interactions and the resources we have made or will make available to the community

    EYEDIAP: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras

    Get PDF
    The lack of a common benchmark for the evaluation of the gaze estimation task from RGB and RGB-D data is a serious limitation for distinguishing the advantages and disadvantages of the many proposed algorithms found in the literature. This paper intends to overcome this limitation by introducing a novel database along with a common framework for the training and evaluation of gaze estimation approaches. In particular, we have designed this database to enable the evaluation of the robustness of algorithms with respect to the main challenges associated to this task: i) Head pose variations; ii) Person variation; iii) Changes in ambient and sensing conditions and iv) Types of target: screen or 3D object

    EYEDIAP Database: Data Description and Gaze Tracking Evaluation Benchmarks

    Get PDF
    The lack of a common benchmark for the evaluation of the gaze estimation task from RGB and RGB-D data is a serious limitation for distinguishing the advantages and disadvantages of the many proposed algorithms found in the literature. The EYEDIAP database intends to overcome this limitation by providing a common framework for the training and evaluation of gaze estimation approaches. In particular, this database has been designed to enable the evaluation of the robustness of algorithms with respect to the main challenges associated to this task: i) Head pose variations; ii) Person variation; iii) Changes in ambient and sensing conditions and iv) Types of target: screen or 3D object. This technical report contains an extended description of the database, we include the processing methodology for the elements provided along with the raw data, the database organization and additional benchmarks we consider relevant to evaluate diverse properties of a given gaze estimator

    A Semi-Automated System for Accurate Gaze Coding in Natural Dyadic Interactions

    Get PDF
    In this paper we propose a system capable of accurately coding gazing events in natural dyadic interactions. Contrary to previous works, our approach exploits the actual continuous gaze direction of a participant by leveraging on remote RGB-D sensors and a head pose-independent gaze estimation method. Our contributions are: i) we propose a systemsetup built from low-cost sensors and a technique to easily calibrate these sensors in a room with minimal assumptions; ii) we propose a method which, provided short manual annotations, can automatically detect gazing events in the rest of the sequence; iii) we demonstrate on substantially long, natural dyadic data that high accuracy can be obtained, showing the potential of our system. Our approach is non-invasive and does not require collaboration from the interactors. These characteristics are highly valuable in psychology and sociology research

    Who Will Get the Grant ? A Multimodal Corpus for the Analysis of Conversational Behaviours in Group

    Get PDF
    In the last couple of years more and more multimodal corpora have been created. Recently many of these corpora have also included RGB-D sensors' data. However, there is to our knowledge no publicly available corpus, which combines accurate gaze-tracking, and high- quality audio recording for group discussions of varying dynamics. With a corpus that would fulfill these needs, it would be possible to investigate higher level constructs such as group involvement, individual engagement or rapport, which all require multi-modal feature extraction. In the following paper we describe the design and recording of such a corpus and we provide some illustrative examples of how such a corpus might be exploited in the study of group dynamics

    3D Head Pose and Gaze Tracking and Their Application to Diverse Multimodal Tasks

    No full text
    In this PhD thesis the problem of 3D head pose and gaze tracking from minimal user cooperation is addressed. By exploiting characteristics of RGB-D sensors, contributions have been made related to consequent problems of the lack of cooperation: in particular, head pose and inter-person appearance variability; in addition to low resolution handling. The resulting system enabled diverse multimodal applications. In particular, recent work combined multiple RGB-D sensors to detect gazing events in dyadic interactions. The research plan consists of: i) Improving the robustness, accuracy and usability of the head pose and gaze tracking system; ii) To use additional multimodal cues, such as speech and dynamic context, to train and adapt gaze models in an unsupervised manner; iii) To extend the application of 3D gaze estimation to diverse multimodal applications. This includes visual focus of attention tasks involving multiple visual targets, e.g. people in a meeting-like setup
    corecore